Autonomous grid scheduling using probabilistic job runtime forecasting

نویسنده

Aleksandar Lazarevic

چکیده

Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses 011 developing a new scheduling model for general-purpose, utility clusters based 011 the concept of user requested job completion deadlines. I11 such a system, a user would be able to request each job to finish by a certain deadline, and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and 011 an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correla tions between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. A11 autom ated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based 011 the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based 011 three simultaneous job properties returned the median absolute percentage error centroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the av erage deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis dem onstrates that deadline scheduling of computational jobs 011 the Grid is achievable using statistical forecasting of job execution times based 011 historical information. The proposed approach is easily implementable. substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future. 3 To the one who made it all possible

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Managing Uncertainty: A Case for Probabilistic Grid Scheduling

The Grid technology is evolving into a global, service-orientated architecture – a universal platform for delivering future high demand computational services. Strong adoption of the Grid and the utility computing concept is leading to an increasing number of Grid installations running a wide range of applications of different size and complexity. In this paper we address the problem of deliver...

متن کامل

A Grid Based System for Data Mining Using MapReduce

In this paper, we discuss a Grid data mining system based on the MapReduce paradigm of computing. The MapReduce paradigm emphasizes system automation of fault tolerance and redundancy, while keeping the programming model for the user very simple. MapReduce is built closely on top of a distributed file system, that allows efficient distributed storage of large data sets, and allows computation t...

متن کامل

Representing Job Scheduling for Volunteer Grid En- vironment using Online Container Stowage

Volunteer grid computing comprises of volunteer resources which are unpredictable in nature and as such the scheduling of jobs among these resources could be very uncertain. It is also difficult to ensure the successful completion of submitted jobs on volunteer resources as these resources may opt to withdraw from the grid system anytime or there might be a resource failure, which requires job ...

متن کامل

A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability

Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...

متن کامل

A Secure Dynamic Job Scheduling on Smart Grid using RSA Algorithm

Grid computing is a computation methodology using group of clusters connected over high-speed networks that involves coordinating and sharing computational power, data storage and network resources. Integrating a set of clusters of workstations into one large computing environment can improve the availability of computing power. The goal of scheduling is to achieve highest possible system throu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Autonomous grid scheduling using probabilistic job runtime forecasting

نویسنده

چکیده

منابع مشابه

Managing Uncertainty: A Case for Probabilistic Grid Scheduling

A Grid Based System for Data Mining Using MapReduce

Representing Job Scheduling for Volunteer Grid En- vironment using Online Container Stowage

A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability

A Secure Dynamic Job Scheduling on Smart Grid using RSA Algorithm

عنوان ژورنال:

اشتراک گذاری